79 research outputs found

    Run-time Resource Management in CMPs Handling Multiple Aging Mechanisms

    Get PDF
    Abstract—Run-time resource management is fundamental for efficient execution of workloads on Chip Multiprocessors. Application- and system-level requirements (e.g. on performance vs. power vs. lifetime reliability) are generally conflicting each other, and any decision on resource assignment, such as core allocation or frequency tuning, may positively affect some of them while penalizing some others. Resource assignment decisions can be perceived in few instants of time on performance and power consumption, but not on lifetime reliability. In fact, this latter changes very slowly based on the accumulation of effects of various decisions over a long time horizon. Moreover, aging mechanisms are various and have different causes; most of them, such as Electromigration (EM), are subject to temperature levels, while Thermal Cycling (TC) is caused mainly by temperature variations (both amplitude and frequency). Mitigating only EM may negatively affect TC and vice versa. We propose a resource orchestration strategy to balance the performance and power consumption constraints in the short-term and EM and TC aging in the long-term. Experimental results show that the proposed approach improves the average Mean Time To Failure at least by 17% and 20% w.r.t. EM and TC, respectively, while providing same performance level of the nominal counterpart and guaranteeing the power budget

    SB-Router: A Swapped Buffer Activated Low Latency Network-on-Chip Router

    Get PDF
    Switch Allocation (SA) holds a critical stage in Network-on-Chip (NoC) routers, its performance gets affected adversely due to Head-of-Line (HoL) blocking. In traditionally used Input-Queued Routers (IQR), packets are arranged in a particular order in each Virtual Channel (VC). This implementation is vulnerable to HoL blocking, as the switch allocator can allocate only those packets which are available at the head in a VC. In this paper, Swapped Buffer (SB) Router architecture is proposed to schedule packets in input buffers by using SB registers. The VCs are designed as SBs, this allows the packets stored in SB registers along with the head packet of VC to participate in SA. The concept of the SB register minimizes the conflicts in SA and thus reduces HoL blocking, therefore improves the performance of NoC. This paper proposes a priority mechanism to prioritize the non-head packets as compared to head packets in case of conflict between them. Two methods have been proposed in this paper, to enhance the performance of the NoC router. First, a VC allocation technique is proposed to optimize the order of packets in the input buffer. Next, SB-Router is combined with the Fill VC allocation technique to further enhance the performance of NoC routers. The performance of the proposed router is evaluated and the experimental results indicate that our design achieves latency improvement of 68.75% over (Time-Series) TS-Router for uniform traffic at the injection rate of 0.42 flits/cycle for a 64 node mesh network with moderate power consumption and area usage. The performance improvement in packet latency for traces from Princeton Application Repository for Shared-Memory Computers (PARSEC) has also been evaluated. With the achieved reduction in latency, the proposed method has the potential to serve high-speed operations while mapping different applications on multiple core architectures.</p

    High-performance long NoC link using delay-insensitive current-mode signaling

    Get PDF
    High-performance long-range NoC link enables efficient implementation of network-on-chip topologies which inherently require high-performance long-distance point-to-point communication such as torus and fat-tree structures. In addition, the performance of other topologies, such as mesh, can be improved by using high-performance link between few selected remote nodes.We presented novel implementation of high-performance long-range NoC link based onmultilevel current-mode signaling and delayinsensitive two-phase 1-of-4 encoding. Current-mode signaling reduces the communication latency of long wires significantlycompared to voltage-mode signaling, making it possible to achieve high throughput without pipelining and/or using repeaters. The performance of the proposed multilevel current-mode interconnect is analyzed and compared with two reference voltage mode interconnects. These two reference interconnects are designed using two-phase 1-of-4 encoded voltage-mode signaling, one with pipeline stages and the other using optimal repeater insertion. The proposed multilevel current-mode interconnect achieves higher throughput and lower latency than the two reference interconnects. Its throughput at 8mm wire length is 1.222GWord/swhich is 1.58 and 1.89 times higher than the pipelined and optimal repeater insertion interconnects, respectively. Furthermore, its power consumption is less than the optimal repeater insertion voltage-mode interconnect, at 10mm wire length its power consumption is 0.75mW while the reference repeater insertion interconnect is 1.066 mW. The effect of crosstalk is analyzed using four-bit parallel data transfer with the best-case and worst-case switching patterns and a transmission line model which has both capacitive coupling and inductive coupling.</p

    Smart Data: A New Perspective of Tackling the Big Data Phenomena Leveraging a Fog Computing System

    Get PDF
    The management of Big Data is a very important issue in emerging IoT technologies. Conventional methods are not sufficient to deal with the ever-increasing amount of raw data originating from the sensors. In this paper we approach this problem from the data structure perspective. We design and develop a concept that we call “Smart Data”. Smart Data is an active and intelligent data structure using a fog computing system that facilitates the management of Big Data in IoT based applications. Such a data cell is initially very simple and lightweight, but it evolves (grows) when traveling through the hierarchical fog computing system towards the cloud, merging with other cells or vice-versa, if the data moves from the cloud towards the actuators. Using Smart Data, we aim to facilitate the preprocessing of data to reduce the load from cloud computing and improve the quality of service and energy efficiency in IoT applications. Our main targets for pre-processing of Big Data using Smart Data and fog computing platform include data filtering, aggregation, compression, and encryption. Moreover, our design goal is to reduce volume, velocity and increase value and veracity of Big Data considering other parameters such as energy efficiency, throughput, scalability and quality of service. </p

    Energy-efficient Post-failure Reconfiguration of Swarms of Unmanned Aerial Vehicles

    Get PDF
    In this paper, the reconfiguration of swarms of unmanned aerial vehicles after simultaneous failures of multiple nodes is considered. The objectives of the post-failure reconfiguration are to provide collision avoidance and smooth energy-efficient movement. To incorporate such a mechanism, three different failure recovery algorithms are proposed namely thin-plate spline, distance- and time-optimal algorithms. These methods are tested on six swarms, with two variations on failing nodes for each swarm. Simulation results of reconfiguration show that the execution of such algorithms maintains the desired formations with respect to avoiding collisions at run-time. Also, the results show the effectiveness concerning the distance travelled, kinetic energy, and energy efficiency. As expected, the distance-optimal algorithm gives the shortest movements, and the time-optimal algorithm gives the most energy-efficient movements. The thinplate spline is also found to be energy-efficient and has less computational cost than the other two proposed methods. Despite the suggested heuristics, these are combinatorial in nature and might be hard to use in practice. Furthermore, the use of the regularization parameter λ in thin-plate spline is also investigated, and it is found that too large values on λ can lead to incorrect locations, including multiple nodes on the same location. In fact, it is found that using λ = 0 worked well in all cases.</p

    Thermal modeling and analysis of advanced 3D stacked structures

    Get PDF
    AbstractThe emerging three-dimensional integrated circuits (3D ICs) offer a promising solution to mitigate the barriers of interconnect scaling in modern systems. It also provides greater design flexibility by allowing heterogeneous integration. However, 3D technology exacerbates the on-chip thermal issues and increases packaging and cooling costs. In this work, a 3D thermal model of a stacked system is developed and thermal analysis is performed in order to analyze different workload conditions using finite element simulations. The steady-state heat transfer analysis on the 3D stacked structure has been performed in order to analyze the effect of variation of die power consumption, with and without hotspots, on temperature in different layers of the stack has been analyzed. We have also investigated the effect of the interaction of hotspots has on peak temperature

    Heterogeneous parallelization for object detection and tracking in UAVs.

    Get PDF
    Recent technical advancements in both fields of unmanned aerial vehicles (UAV) control and artificial intelligence (AI) have made a certain realm of applications possible. However, one of the main problems in integration of these two areas is the bottle-neck of computing AI applications on UAV's resource limited platform. One of the main solution for this problem is that AI and control software from one side and computing hardware mounted on UAV from the other side be adopted together based on the main constraints of the resource limited computing platform on UAV. Basically, the target constraints of such adaptation are performance, energy efficiency, and accuracy. In this paper, we propose a strategy to integrate and adopt the commonly used object detection and tracking algorithm and UAV control software to be executed on a heterogeneous resource limited computing units on a UAV. For object detection, a convolutional neural network (CNN) algorithm is used. For object tracking, a novel algorithm is proposed that can execute along with object tracking via sequential stream data. For UAV control, a Gain-Scheduled PID controller is designed that steers the UAV by continuously manipulation of the actuators based on the stream data from the tracking unit and dynamics of the UAV. All the algorithms are adopted to be executed on a heterogeneous platform including NVIDIA Jetson TX2 embedded computer and an ARM Cortex M4. The observation from real-time operation of the platform shows that using the proposed platform reduces the power consumption by 53.69% in contrast with other existing methods while having marginal penalty for object detection and tracking parts

    An AI-in-Loop Fuzzy-Control Technique for UAV’s Stabilization and Landing

    Get PDF
    In this paper, an adaptable fuzzy control mechanism for an Unmanned Aerial Vehicle (UAV) to manipulate its mechanical actuators is provided. The mission (landing) for the UAV is defined to track (land on) an object that is detected by a deep learning object detection algorithm. The inputs of the controller are the location and speed of the UAV that have been calculated based on the location of the detected object. Two separate fuzzy controllers are proposed to control the UAV’s motor throttle and its roll and pitch over the mission and landing time. Fuzzy logic controller (FLC) is an intelligent controller that can be used to compensate for the non-linearity behaviour of the UAV by designing a specific fuzzy rule base. These rules will be utilized to adjust the control parameters during the mission and landing period in runtime. To add the effect of the ground for tuning the FLC membership function over the landing operation, a computational flow dynamic (CFD) modeling has been investigated. The proposed techniques is evaluated on MATLAB/Simulink simulation platform and real environment. Statistical analysis of the UAV location reported during stabilization and landing process, on both simulation and real platform, show that the proposed technique outperforms the similar state-of-art control techniques for both mission and landing control.</p
    • …
    corecore